间接飞行时间(I-TOF)成像是由于其小尺寸和价格合理的价格导致移动设备的深度估计方式。以前的作品主要专注于I-TOF成像的质量改进,特别是固化多路径干扰(MPI)的效果。这些调查通常在特定约束的场景中进行,在近距离,室内和小环境光下。令人惊讶的一点工作已经调查了现实生活场景的I-TOF质量改善,其中强烈的环境光线和远距离由于具有限制传感器功率和光散射而导致的诱导射击噪声和信号稀疏引起的困难。在这项工作中,我们提出了一种基于新的学习的端到端深度预测网络,其噪声原始I-TOF信号以及RGB图像基于涉及隐式和显式对齐的多步方法来解决它们的潜在表示。预测与RGB视点对齐的高质量远程深度图。与基线方法相比,我们在挑战真实世界场景中测试了挑战性质场景的方法,并在最终深度地图上显示了超过40%的RMSE改进。
translated by 谷歌翻译
Multimodal deep learning has been used to predict clinical endpoints and diagnoses from clinical routine data. However, these models suffer from scaling issues: they have to learn pairwise interactions between each piece of information in each data type, thereby escalating model complexity beyond manageable scales. This has so far precluded a widespread use of multimodal deep learning. Here, we present a new technical approach of "learnable synergies", in which the model only selects relevant interactions between data modalities and keeps an "internal memory" of relevant data. Our approach is easily scalable and naturally adapts to multimodal data inputs from clinical routine. We demonstrate this approach on three large multimodal datasets from radiology and ophthalmology and show that it outperforms state-of-the-art models in clinically relevant diagnosis tasks. Our new approach is transferable and will allow the application of multimodal deep learning to a broad set of clinically relevant problems.
translated by 谷歌翻译
The success of Deep Learning applications critically depends on the quality and scale of the underlying training data. Generative adversarial networks (GANs) can generate arbitrary large datasets, but diversity and fidelity are limited, which has recently been addressed by denoising diffusion probabilistic models (DDPMs) whose superiority has been demonstrated on natural images. In this study, we propose Medfusion, a conditional latent DDPM for medical images. We compare our DDPM-based model against GAN-based models, which constitute the current state-of-the-art in the medical domain. Medfusion was trained and compared with (i) StyleGan-3 on n=101,442 images from the AIROGS challenge dataset to generate fundoscopies with and without glaucoma, (ii) ProGAN on n=191,027 from the CheXpert dataset to generate radiographs with and without cardiomegaly and (iii) wGAN on n=19,557 images from the CRCMS dataset to generate histopathological images with and without microsatellite stability. In the AIROGS, CRMCS, and CheXpert datasets, Medfusion achieved lower (=better) FID than the GANs (11.63 versus 20.43, 30.03 versus 49.26, and 17.28 versus 84.31). Also, fidelity (precision) and diversity (recall) were higher (=better) for Medfusion in all three datasets. Our study shows that DDPM are a superior alternative to GANs for image synthesis in the medical domain.
translated by 谷歌翻译
The circular coordinates algorithm of de Silva, Morozov, and Vejdemo-Johansson takes as input a dataset together with a cohomology class representing a $1$-dimensional hole in the data; the output is a map from the data into the circle that captures this hole, and that is of minimum energy in a suitable sense. However, when applied to several cohomology classes, the output circle-valued maps can be "geometrically correlated" even if the chosen cohomology classes are linearly independent. It is shown in the original work that less correlated maps can be obtained with suitable integer linear combinations of the cohomology classes, with the linear combinations being chosen by inspection. In this paper, we identify a formal notion of geometric correlation between circle-valued maps which, in the Riemannian manifold case, corresponds to the Dirichlet form, a bilinear form derived from the Dirichlet energy. We describe a systematic procedure for constructing low energy torus-valued maps on data, starting from a set of linearly independent cohomology classes. We showcase our procedure with computational examples. Our main algorithm is based on the Lenstra--Lenstra--Lov\'asz algorithm from computational number theory.
translated by 谷歌翻译
Computer-aided systems in histopathology are often challenged by various sources of domain shift that impact the performance of these algorithms considerably. We investigated the potential of using self-supervised pre-training to overcome scanner-induced domain shifts for the downstream task of tumor segmentation. For this, we present the Barlow Triplets to learn scanner-invariant representations from a multi-scanner dataset with local image correspondences. We show that self-supervised pre-training successfully aligned different scanner representations, which, interestingly only results in a limited benefit for our downstream task. We thereby provide insights into the influence of scanner characteristics for downstream applications and contribute to a better understanding of why established self-supervised methods have not yet shown the same success on histopathology data as they have for natural images.
translated by 谷歌翻译
Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models in particular have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen and Stable Diffusion. However, their use in medicine, where image data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play a crucial role in privacy preserving artificial intelligence and can also be used to augment small datasets. Here we show that diffusion probabilistic models can synthesize high quality medical imaging data, which we show for Magnetic Resonance Images (MRI) and Computed Tomography (CT) images. We provide quantitative measurements of their performance through a reader study with two medical experts who rated the quality of the synthesized images in three categories: Realistic image appearance, anatomical correctness and consistency between slices. Furthermore, we demonstrate that synthetic images can be used in a self-supervised pre-training and improve the performance of breast segmentation models when data is scarce (dice score 0.91 vs. 0.95 without vs. with synthetic data).
translated by 谷歌翻译
学习在无人驾驶汽车(UAV)捕获的图像中检测物体(例如人类)通常会遭受无人机对物体的位置造成的巨大变化。此外,现有的基于无人机的基准数据集不提供足够的数据集元数据,这对于精确的模型诊断至关重要,并且学习功能不变。在本文中,我们介绍了大天使,这是第一个基于无人机的对象检测数据集,该数据集由具有相似想象条件以及无人机位置以及对象姿势元数据捕获的真实和合成子集组成。一系列实验经过精心设计,使用最先进的对象检测器设计,以证明在模型评估过程中利用元数据的好处。此外,还提供了几种涉及模型微调过程中涉及真实和合成数据的关键见解。最后,我们讨论了有关大天使的优势,局限性和未来方向,以突出其对更广泛的机器学习社区的独特价值。
translated by 谷歌翻译
深度学习(DL)方法已显示出令人鼓舞的结果,以解决诸如从$ k $ -space数据中的MR图像重建等逆问题。但是,这些方法目前尚无重建质量的保证,并且这种算法的可靠性仅被了解不足。对抗攻击提供了一种有价值的工具,可以了解可能的故障模式和基于DL的重建算法的最坏情况。在本文中,我们描述了对多圈$ K $空间测量结果的对抗性攻击,并在最近提出的E2E-VARNET和更简单的基于UNET的模型上对其进行评估。与先前的工作相反,攻击旨在特异性改变诊断相关的区域。使用两种逼真的攻击模型(对抗性$ K $ - 空间噪声和对抗性旋转),我们能够证明,当前基于DL DL的最新重建算法确实对此类扰动敏感,而相关诊断信息可能会在某种程度上迷路。令人惊讶的是,在我们的实验中,UNET和更复杂的E2E-VARNET对此类攻击同样敏感。我们的发现进一步增加了以下证据:必须谨慎行事,因为基于DL的方法更接近临床实践。
translated by 谷歌翻译
我们表明,在将直接转换应用到数据集之后,自回归语言模型可以学会填充文本,这简单地将文本的跨度从文档的中间移动到了其末尾。虽然近年来这种数据增强引起了人们的极大兴趣,但我们提供了广泛的证据,表明以这种方式转换的数据很大一部分并不会损害原始的左右生成能力,这是通过困惑和抽样评估来衡量的广泛的尺度。鉴于培训模型对中间的有用性,简单性和效率(FIM),我们建议默认情况下使用FIM培训未来的自回归语言模型。为此,我们在关键的超参数上运行一系列消融,例如数据转换频率,转换的结构以及选择填充跨度的方法。我们使用这些消融来规定强大的默认设置和最佳实践来训练FIM模型。我们发布了最佳的填充模型,该模型在API中培训了最佳实践,并发布了我们的填充基准,以帮助未来的研究。
translated by 谷歌翻译
社会机器人行为的最终用户编程通常受到预定义的运动的限制。我们提出了一个伪造的机器人界面,该接口提供了一种更直观的编程机器人表达运动的方法。当用户操纵机器人的木偶时,实际机器人会复制动作,提供实时视觉反馈。通过此提议的界面,即使在有限的培训中,新手用户也可以有效地设计和程序表达运动。我们介绍了我们的初步用户研究结果。
translated by 谷歌翻译